Features, Regions, Gestures: Components of a Generic Gesture Recognition Engine

نویسندگان

Florian Echtler

Gudrun Klinker

Andreas Butz

چکیده

In recent years, research in novel types of human-computer interaction, for example multi-touch or tangible interfaces, has increased considerably. Although a large number of innovative applications have already been written based on these new input methods, they often have significant deficiencies from a developer’s point of view. Aspects such as configurability, portability and code reuse have been largely overlooked. A prime example for these problems is the topic of gesture recognition. Existing implementations are mostly tied to a certain hardware platform, tightly integrated into user interface libraries and monolithic, with hard coded gesture descriptions. Developers are therefore time and again forced to reimplement crucial application components. To address these drawbacks, we propose a clean separation between user interface and gesture recognition. In this paper, we present a widely applicable, generic specification of gestures which enables the implementation of a hardwareindependent standalone gesture recognition engine for multitouch and tangible interaction. The goal is to allow the developer to focus on the user interface itself instead of on internal components of the application. INTRODUCTION & RELATED WORK More and more researchers and hobbyists have gained access to novel user interfaces in the last few years. Examples include tangible input devices or multitouch surfaces. Consequently, the number of applications being written for these systems is increasing steadily. However, from a developer’s point of view, most of these applications still have drawbacks. Core components such as gesture recognition are usually integrated so tightly with the rest of the application that they are nearly impossible to reuse in a different context. Additionally, most applications were designed to run on a single piece of hardware only. While this approach probably suited the original developers best, it impedes other persons who try to build on this work. In the end, many applications for novel interactive devices are consequently created from scratch, consuming valuable development time. For example, countless applications contain code which tries to discover widely used multi-finger gestures for scaling and rotation, these being prime candidates for a more general approach. Of course, limiting any kind of generic gesture recognition to these few examples alone would not provide a great advantage, as many more gestures exist which the developer might want to include in an hypothetical interface. Therefore, the first step in a general approach to gesture recognition is to design a formal, extensible specification of gestures. From an abstract point of view, the goal is to separate the semantics of a gesture (the intent of the user) from its syntax (the motions executed by the user). While many graphical toolkits such as Qt, GTK+, Swing, Aqua or the Windows User Interface API exist today, all of them have originally been designed with common input devices such as mouse and keyboard in mind. To some extent, issues such as multi-point input, rotation independence or gesture recognition are being addressed in recent versions or extensions of these toolkits. Examples include DiamondSpin [8], the Microsoft Surface SDK or the support for multitouch input in Windows 7 and MacOS X. Nevertheless, all these libraries still do not provide any separation between the syntax and semantics of gestures. When attempting to customize an application on a per-user basis or adapt it to a different type of hardware, this still requires significant changes to internal components of the library or application itself. Some attempts have already been made with respect to recognizing gestures in the input stream as opposed to simply reacting to touch/release events. Several approaches based on DiamondTouch have been presented by Wu et al. [11, 10]. A common aspect of these systems is that gesture recognition still is performed inside the application itself. Some preliminary approaches to separate the recognition of gestures from the end-user part of the application exist [6, 3]. With the exception of Sparsh-UI [5], these systems are not yet beyond the design stage. Sparsh-UI also follows a layered approach with a separate gesture server that is able to recognize some fixed gestures for rotation, scaling etc. independently from the end-user application. However, while being a step towards a more abstracted view of gestures, the crucial aspect of gesture customization has not yet been addressed. A FORMAL SPECIFICATION OF GESTURES Before discussing the details of our approach, some necessary prerequisites need to be described first. We assume that the raw input data which is generated by the input hardware has already been transformed into an abstract representation such as the popular TUIO protocol [7]. We also assume that the location data delivered by this abstract protocol has been transformed into a common reference frame, e.g., screen coordinates. These assumptions should serve to hide any hardwarerelated differences from the gesture recognizer. Below, we will refer to data generated by the hardware as input events. Usually, every input object (e.g., hand, finger or tangible object) generates one input event for every frame of sensor data in which it is present. More details on the layered software architecture on which this approach is based can also be found in [2]. The formal specification which we will describe here forms the basis for a communications protocol. This protocol is used by the application to specify screen areas and the gestures which are to be recognized within them. Afterwards, the gesture recognizer will use the same protocol to notify the application when one of the previously specified gestures has been triggered by the users’ motions. Widgets and Event Handling Before discussing the specification of gestures and events, we will briefly examine how widgets and events are handled in common mouse-based toolkits. Here, every widget which is part of the user interface corresponds to a window. While this term is mostly applied only to top-level application windows, every tiny widget is associated with a window ID. In this context, a window is simply a rectangular, axis-aligned area in screen coordinates which is able to receive events and which can be nested within another window. Due to this parent-child relationship between windows, they are usually stored in a tree. Should a new mouse event occur at a specific location, this tree is traversed starting from the root window which usually spans the entire screen. Every window is checked whether it contains the event’s location and whether its filters match the event’s type. If both conditions are met, the check is repeated for the children of this window until the most deeply nested window is found which matches this event. The event is then delivered to the event handler of this window. This process is called event capture. However, there are occasions where this window will not handle the event. One such occasion is, e.g., a round button. Events which are located inside the rectangular window, but outside the circular button area itself should have been delivered to the parent instead. In this case, the button’s event handler will reject the event, thereby triggering a process called event bubbling. The event will now be successively delivered to all parent windows, starting with the direct parent, until one of them accepts and handles the event. Should the event reach the root of the tree without having been accepted by any window, it is discarded. When we now compare this commonly used method to our approach, one fundamental difference is apparent. Instead of one single class of event, we are dealing with two semantically different kinds of events. The first class is comprised of input events which describe raw Region Region Gesture Gesture Gesture Gesture Feature Feature Feature Feature "move" "tap" "rotate" "spin" Motion BlobCount Rotation Scale ...

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

Human Computer Interaction Using Vision-Based Hand Gesture Recognition

With the rapid emergence of 3D applications and virtual environments in computer systems; the need for a new type of interaction device arises. This is because the traditional devices such as mouse, keyboard, and joystick become inefficient and cumbersome within these virtual environments. In other words, evolution of user interfaces shapes the change in the Human-Computer Interaction (HCI). In...

متن کامل

Human Computer Interaction Using Vision-Based Hand Gesture Recognition

متن کامل

A Gesture Based Interface for Remote Robot Cqntrol

This paper describes a system for recognition of dynamic hand gestures. Appearance based and motion related features have been used for characterizing the gestures. Gesture sequences have been interpreted using HMM based recognition engine. Interpreted gesture sequences have been translated into coininands for controlling a remote robot.

متن کامل

Vision-Based Hand Gestures Recognition for Human-Robot Interaction

This paper presents a vision-based hand gesture recognition system for interaction between human and robot. A real-time two hands gestures recognition system has been developed by combining three larger components analysis based on skin-color segmentation and multiple features based template-matching techniques. Gesture commands are generated and issued whenever the combinations of three skin-l...

متن کامل

Neural Network Performance Analysis for Real Time Hand Gesture Tracking Based on Hu Moment and Hybrid Features

This paper presents a comparison study between the multilayer perceptron (MLP) and radial basis function (RBF) neural networks with supervised learning and back propagation algorithm to track hand gestures. Both networks have two output classes which are hand and face. Skin is detected by a regional based algorithm in the image, and then networks are applied on video sequences frame by frame in...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Features, Regions, Gestures: Components of a Generic Gesture Recognition Engine

نویسندگان

چکیده

منابع مشابه

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Human Computer Interaction Using Vision-Based Hand Gesture Recognition

Human Computer Interaction Using Vision-Based Hand Gesture Recognition

A Gesture Based Interface for Remote Robot Cqntrol

Vision-Based Hand Gestures Recognition for Human-Robot Interaction

Neural Network Performance Analysis for Real Time Hand Gesture Tracking Based on Hu Moment and Hybrid Features

عنوان ژورنال:

اشتراک گذاری